Model Selection

Video Action Recognition

# Video Action Recognition

Videomae Base Finetuned Kinetics 0409 Final 5sec Org Ab7 Val Inside Train

This model is a fine-tuned version based on MCG-NJU/videomae-base-finetuned-kinetics, primarily used for video understanding tasks, achieving an accuracy of 91.38% on the evaluation set.

Video Processing

Videomae Base Finetuned Kinetics 0409 Final 5sec Org Ab7 Val As123 Retry

A video understanding model fine-tuned based on MCG-NJU/videomae-base-finetuned-kinetics, achieving 91.23% accuracy on the evaluation set

Video Processing

Videomae Base Finetuned Ucf101 Subset

Video classification model fine-tuned on a subset of UCF101 based on the VideoMAE base model

Video Processing

Videomae Base Finetuned Kinetics 0408 Final 5sec Org Ab7 Val As123

A video action recognition model based on the VideoMAE architecture, fine-tuned on the Kinetics dataset with an accuracy of 92.25%

Video Processing

Videomae Base Finetuned Kinetics 0408 Final 45sec Org

A video understanding model fine-tuned based on MCG-NJU/videomae-base-finetuned-kinetics, achieving an accuracy of 90.97% on the evaluation set

Video Processing

Videomae Base Finetuned Ucf101 Subset

A video understanding model fine-tuned on a subset of the UCF101 action recognition dataset based on the VideoMAE base model

Video Processing

Timesformer Hr Finetuned K600

TimeSformer-HR is a video action recognition model optimized for high-resolution videos and fine-tuned on the Kinetics-600 dataset.

Video Processing

Timesformer Hr Finetuned K400

TimeSformer-HR is a high-resolution spatiotemporal Transformer model for video, fine-tuned on the Kinetics-400 dataset, suitable for video action recognition tasks.

Video Processing

Timesformer Base Finetuned Ssv2

TimeSformer is a Transformer-based video understanding model specifically optimized for temporal action recognition tasks.

Video Processing

Timesformer Base Finetuned K600

TimeSformer is a video understanding model based on the Transformer architecture, specifically designed for video classification tasks.

Video Processing

Timesformer Base Finetuned K400

TimeSformer is a Transformer-based video understanding model, specifically fine-tuned on the Kinetics-400 dataset.

Video Processing

Vivit B 16x2 Kinetics400 Finetuned Cctv Surveillance

A video action recognition model based on the ViViT architecture, fine-tuned specifically for CCTV surveillance scenarios, excelling in action recognition tasks.

Video Processing

Videomae Base Finetuned Kinetics Finetuned Dcsass Shoplifting Subset

A video classification model based on the VideoMAE architecture, fine-tuned specifically for shoplifting behavior detection

Video Processing

Videomae Base Finetuned Kinetics Finetuned Fall Detect

A video action recognition model based on the VideoMAE architecture, specifically fine-tuned for fall detection tasks

Video Processing

Athit Timesformer 32PS

TimeSformer is a video understanding model based on spatial-temporal attention mechanism, fine-tuned on the Kinetics-400 dataset, suitable for video classification tasks.

Video Processing

Timesformer Base Finetuned K400 Finetuned Asl

This model is a video classification model fine-tuned based on facebook/timesformer-base-finetuned-k400, achieving an accuracy of 96.25% on the evaluation set.

Video Processing

Timesformer Base Finetuned K400 Continual Lora Ucf101 Continual Lora Ucf101

A video action recognition model based on TimeSformer architecture, pre-trained on Kinetics-400 dataset and fine-tuned on UCF101 dataset

Video Processing

Timesformer Base Finetuned K400 Continual Lora Ucf101

A video classification model based on the TimeSformer architecture, pre-trained on the Kinetics-400 dataset and fine-tuned on the UCF101 dataset, utilizing LoRA technology for continual learning.

Video Processing

Timesformer Base Finetuned K400 Finetuned Olimpics Sport Subset

A video action recognition model based on TimeSformer architecture, pre-trained on Kinetics-400 dataset and fine-tuned for Olympic sports subset

Video Processing

Videomae Small Finetuned Ssv2

VideoMAE is a self-supervised pretrained video model based on Masked Autoencoder (MAE), fine-tuned on the Something-Something V2 dataset for video classification tasks.

Video Processing

Videomae Base Finetuned Ucf101 Subset

A video classification model fine-tuned on a subset of UCF101 based on the VideoMAE base model

Video Processing

Videomae Base Finetuned Ucf101 Subset

A video understanding model fine-tuned on a subset of UCF101 based on the VideoMAE base model, achieving 95.71% accuracy

Video Processing

Videomae Base Finetuned Ucf101 Subset

A video classification model fine-tuned on a subset of UCF101 based on the VideoMAE base model, achieving 95.22% accuracy

Video Processing

Videomae Base Short Finetuned Ssv2 Finetuned Rwf2000 Epochs8 Batch8 Fp16

Video action recognition model based on VideoMAE architecture, pre-trained on SSv2 dataset and further fine-tuned on RWF-2000 dataset

Video Processing

Videomae Base Ssv2 Finetuned Rwf2000

A video understanding model based on the VideoMAE architecture, fine-tuned on the RWF-2000 dataset for violence detection tasks

Video Processing

Timesformer Large Finetuned K400

TimeSformer is a video classification model based on spatio-temporal attention mechanism, specifically designed for video understanding tasks.

Video Processing

Timesformer Base Finetuned K400

TimeSformer is a video classification model based on spatio-temporal attention mechanism, specifically fine-tuned for the Kinetics-400 dataset.

Video Processing

Timesformer Hr Finetuned K600

TimeSformer is a video understanding model based on spatiotemporal attention mechanisms, with its high-resolution variant specifically fine-tuned for the Kinetics-600 dataset.

Video Processing

Videomae Base Finetuned Ucf101

Video action recognition model fine-tuned on UCF101 dataset based on VideoMAE Base model

Video Processing

Transformers English

Videomae Base Finetuned Ucf101 Subset

Video classification model based on VideoMAE architecture, fine-tuned on a subset of UCF101 with an accuracy of 85.16%

Video Processing

Timesformer Hr Finetuned K600

TimeSformer is a video classification model based on spatio-temporal attention mechanisms, specifically designed for video understanding tasks.

Video Processing

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase